Self-Driving Car Engineer Nanodegree

Deep Learning

Project: Build a Traffic Sign Recognition Classifier

In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.


Step 1: Dataset Exploration

Visualize the German Traffic Signs Dataset. This is open ended, some suggestions include: plotting traffic signs images, plotting the count of each sign, etc. Be creative!

The pickled data is a dictionary with 4 key/value pairs:

  • features -> the images pixel values, (width, height, channels)
  • labels -> the label of the traffic sign
  • sizes -> the original width and height of the image, (width, height)
  • coords -> coordinates of a bounding box around the sign in the image, (x1, y1, x2, y2). Based the original image (not the resized version).
In [1]:
# Load pickled data
import pickle
import numpy as np
import urllib.request

# TODO: fill this in based on where you saved the training and testing data
training_file = urllib.request.urlopen("https://www.dropbox.com/s/vfdp152tvhdhnib/train.p?dl=1#")
testing_file = urllib.request.urlopen("https://www.dropbox.com/s/vfdp152tvhdhnib/test.p?dl=1#")

train = pickle.load(training_file)


test = pickle.load(testing_file)
    
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
In [13]:
### To start off let's do a basic data summary.

# TODO: number of training examples
n_train = len(X_train)

# TODO: number of testing examples
n_test = len(X_test)


# # TODO: what's the shape of an image?
image_shape = X_train[0].shape

# # TODO: how many classes are in the dataset
n_classes = len(np.unique(y_train))


print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
Number of training examples = 9472
Number of testing examples = 12630
Image data shape = (1024,)
Number of classes = 2
In [5]:
%matplotlib inline
import matplotlib.pyplot as plt
counts = np.unique(y_train, return_counts=True)
fig = plt.bar(counts[0], counts[1])
plt.show()
In [144]:
#EXAMPLES
import time
printed = set()
for i, image in enumerate(X_train):
    if y_train[i] not in printed:
        print(y_train[i])
        plt.figure()
        plt.imshow(image)
        plt.show()
        printed.add(y_train[i])
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

Step 2: Design and Test a Model Architecture

Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.

There are various aspects to consider when thinking about this problem:

  • Your model can be derived from a deep feedforward net or a deep convolutional network.
  • Play around preprocessing techniques (normalization, rgb to grayscale, etc)
  • Number of examples per label (some have more than others).
  • Generate fake data.

Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.

Implementation

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

Question 1

Describe the techniques used to preprocess the data.

Answer:

To preprocess the data, I first converted all of the images to grayscale, because I wanted to simplify the process and help mitigate the impact of different lighting conditions. Then I normalized the images to 0 mean and unit variance, so that the images would be able to be processed in the same way. I then took all of the labels and One Hot encoded them so they could be compatible with the network.

In [2]:
from sklearn.preprocessing import scale
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import Normalizer
from PIL import Image
features = []
for image in X_train:
    features.append(np.array(Image.fromarray(image, 'RGB').convert('L').getdata()))
normalizer = Normalizer()
features = normalizer.fit_transform(np.array(features))

encoder = OneHotEncoder(sparse=False)
new_train = []
for y in y_train:
    new_train.append([y])
labels = encoder.fit_transform(np.array(new_train))
/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/sklearn/utils/validation.py:420: DataConversionWarning: Data with input dtype int64 was converted to float64 by the normalize function.
  warnings.warn(msg, DataConversionWarning)

Question 2

Describe how you set up the training, validation and testing data for your model. If you generated additional data, why?

Answer: I used a simple train test split to split the original training data into validation and training sets. By default, the function puts 75% of the data into the train set and 25% in the validation set. I think this is reasonable, as 25% is big enough to get good results for the accuracy but also low enough as to not deplete the model of too much data.

In [3]:
X_train, x_validate, y_train, y_validate = train_test_split(features, labels)

Question 3

What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.

Answer: I heavily relied on this tutorial: https://www.tensorflow.org/versions/r0.11/tutorials/mnist/pros/index.html I created a convolutional neural network: First Layer- reshapes the flattened grayscale representation of the image into a 32x32 image, and then does a 5x5 convolution with 32 features for each patch. Then, max pooling is applied. Second Layer- similar to the first layer, but with 64 features for each patch. Third Layer- Takes the output from the entire second layer and generates 1024 features. Then, dropout is applied. Readout Layer - Takes the 1024 features and generates 43 features, corresponding to the signs.

In [4]:
import tensorflow as tf
from tqdm import tqdm
sess = tf.InteractiveSession()
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

#placeholder data
x = tf.placeholder(tf.float32, shape=[None, 1024])
y_ = tf.placeholder(tf.float32, shape=[None, 43])

#first layer
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,32,32,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

#second layer
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

#final layer
W_fc1 = weight_variable([8 * 8 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 8*8*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

#dropout
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

#readout layer
W_fc2 = weight_variable([1024, 43])
b_fc2 = bias_variable([43])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

Question 4

How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)

Answer: I tested batches of 64, 128, and 256, and settled on 128. I tested many of the other TensorFlow optimizers, but I found that AdamOptimizer yielded the best accuracy. I found that performance stopped increasing at around 10,000 iterations with the learning rate of 0.0001, which was low enough that I didn't see noticeable overfitting. I also tested different combinations of learning rates and iterations, but I found this was the best.

In [5]:
from sklearn.utils import shuffle

def next_batch(size):
    x, y = shuffle(X_train, y_train, n_samples=size)
    return x, y
    
batch_size = 128
iterations = 10000
learning_rate = 0.0001
#0.974034

#training
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())

for i in tqdm(range(iterations)):
  batch = next_batch(batch_size)
  if i%100 == 0: #updates
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print("step %d, training accuracy %g"%(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

#testing (on validation set)
print("validation accuracy %g"%accuracy.eval(feed_dict={
    x: x_validate, y_: y_validate, keep_prob: 1.0}))
  0%|          | 5/10000 [00:03<6:31:56,  2.35s/it]
step 0, training accuracy 0.0625
  1%|          | 103/10000 [00:05<04:17, 38.47it/s]
step 100, training accuracy 0.0859375
  2%|▏         | 207/10000 [00:08<04:13, 38.56it/s]
step 200, training accuracy 0.140625
  3%|▎         | 306/10000 [00:10<04:00, 40.35it/s]
step 300, training accuracy 0.148438
  4%|▍         | 408/10000 [00:13<04:03, 39.34it/s]
step 400, training accuracy 0.171875
  5%|▌         | 507/10000 [00:15<03:40, 42.97it/s]
step 500, training accuracy 0.226562
  6%|▌         | 607/10000 [00:18<03:48, 41.13it/s]
step 600, training accuracy 0.265625
  7%|▋         | 707/10000 [00:20<03:48, 40.63it/s]
step 700, training accuracy 0.375
  8%|▊         | 807/10000 [00:22<03:45, 40.73it/s]
step 800, training accuracy 0.421875
  9%|▉         | 907/10000 [00:25<03:34, 42.31it/s]
step 900, training accuracy 0.445312
 10%|█         | 1006/10000 [00:27<03:49, 39.14it/s]
step 1000, training accuracy 0.601562
 11%|█         | 1106/10000 [00:30<03:39, 40.48it/s]
step 1100, training accuracy 0.609375
 11%|█▏        | 1136/10000 [00:30<03:28, 42.55it/s]

KeyboardInterruptTraceback (most recent call last)
<ipython-input-5-f59c94973a11> in <module>()
     23         x:batch[0], y_: batch[1], keep_prob: 1.0})
     24     print("step %d, training accuracy %g"%(i, train_accuracy))
---> 25   train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
     26 
     27 #testing (on validation set)

/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/tensorflow/python/framework/ops.py in run(self, feed_dict, session)
   1459         none, the default session will be used.
   1460     """
-> 1461     _run_using_default_session(self, feed_dict, self.graph, session)
   1462 
   1463 

/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/tensorflow/python/framework/ops.py in _run_using_default_session(operation, feed_dict, graph, session)
   3367                        "the operation's graph is different from the session's "
   3368                        "graph.")
-> 3369   session.run(operation, feed_dict)
   3370 
   3371 

/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    338     try:
    339       result = self._run(None, fetches, feed_dict, options_ptr,
--> 340                          run_metadata_ptr)
    341       if run_metadata:
    342         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    562     try:
    563       results = self._do_run(handle, target_list, unique_fetches,
--> 564                              feed_dict_string, options, run_metadata)
    565     finally:
    566       # The movers are no longer used. Delete them.

/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
    635     if handle is None:
    636       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
--> 637                            target_list, options, run_metadata)
    638     else:
    639       return self._do_call(_prun_fn, self._session, handle, feed_dict,

/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
    642   def _do_call(self, fn, *args):
    643     try:
--> 644       return fn(*args)
    645     except tf_session.StatusNotOK as e:
    646       error_message = compat.as_text(e.error_message)

/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
    626       else:
    627         return tf_session.TF_Run(
--> 628             session, None, feed_dict, fetch_list, target_list, None)
    629 
    630     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt: 

Testing: With >97% accuracy on the validation set, I thought it was time to test on the test set. Performance was surprisingly even better than the validation set at 99.4%.

In [9]:
features = []
for image in X_test:
    features.append(np.array(Image.fromarray(image, 'RGB').convert('L').getdata()))
features = normalizer.transform(np.array(features))

new_train = []

for y in y_test:
    new_train.append([y])
labels = encoder.transform(np.array(new_train))


print("test accuracy %g"%accuracy.eval(feed_dict={
    x: features, y_: labels, keep_prob: 1.0}))
/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/sklearn/utils/validation.py:420: DataConversionWarning: Data with input dtype int64 was converted to float64 by the normalize function.
  warnings.warn(msg, DataConversionWarning)
test accuracy 0.993429

Testing: When applied to real images, the model performed dismally (at least for these images. Out of the five that I tested, it only classified one (do not enter) correctly.

Question 5

What approach did you take in coming up with a solution to this problem?

Answer: I approached the problem in a methodical way, testing different combinations of hyperparameters, optimizers, and pooling to arrive at the best implementation possible for the validation data.


Step 3: Test a Model on New Images

Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.

You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.

Implementation (Question 6-9)

As you can see below, out of the 5 test images I used, only one (do not enter) was classified correctly. For the others, performance was quite dismal. I am not sure why this is, and how it is possible that I scored 99% on the test set but only 20% on these test images.

Image 1- 20kph, predicted yield. I would think this would be easy to classify, because it's nothing but the sign. I'm not sure why it didn't work. The model was very certain about its prediction, but 20kph didn't appear in the top 5.

Image 2- 70kph, predicted general caution. This sign is photographed from a weird angle (not one you'd see from a car) so it makes sense that it wouldn't predict properly. The model was somewhat certain about its prediction, and 70kph appeared as the third prediction.

Image 3- no entry, predicted no entry. I thought this was pretty straightforward, and the classifier got it right. The model was very certain about this prediction.

Image 4- pedestrian, predicted 30kph. The sign is in full view and I have no idea why it got it so wrong. The model wasn't very certain about the prediction, but pedestrian doesn't appear in the top 5.

Image 5- pedestrian, predicted general caution. This is an American sign, so it would be hard for the classifier (trained on German data) to work for. The model was pretty certain about its prediction, but pedestrian didn't appear in the top 5.

In [16]:
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import csv
%matplotlib inline

reader = csv.DictReader(open("signnames.csv", "r"))
mapping = {}
for row in reader:
    mapping[row["ClassId"]] = row["SignName"]

    
for image in ["20kph.jpg", "70kph.jpg", "do not enter.jpg", "german-pedestrian.jpg",
              "pedestrian2.jpg"]:
    image = mpimg.imread(image)
    plt.figure()
    plt.imshow(image)
    plt.show()
    image = np.array(Image.fromarray(image, 'RGB').convert('L').getdata())
    image = normalizer.transform(image)
    
    probabilities=y_conv
    softmax_probabilities = tf.nn.softmax(probabilities.eval(feed_dict={x: image, keep_prob: 1.0},
                                                                                  session=sess))
    print("softmax probabilities:")
    plt.figure()
    plt.plot(softmax_probabilities.eval(session = sess)[0])
    plt.show()
    top_five = sess.run(tf.nn.top_k(softmax_probabilities, k=5))
    for i, signtype in enumerate(top_five[1][0]):
        print("Prediction number " + str(i+1) + ": " + mapping[str(signtype)])
/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)
/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/sklearn/utils/validation.py:420: DataConversionWarning: Data with input dtype int64 was converted to float64 by the normalize function.
  warnings.warn(msg, DataConversionWarning)
Prediction number 1: Yield
Prediction number 2: Priority road
Prediction number 3: Ahead only
Prediction number 4: Speed limit (60km/h)
Prediction number 5: Turn left ahead
/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)
/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/sklearn/utils/validation.py:420: DataConversionWarning: Data with input dtype int64 was converted to float64 by the normalize function.
  warnings.warn(msg, DataConversionWarning)
Prediction number 1: General caution
Prediction number 2: Bumpy road
Prediction number 3: Speed limit (70km/h)
Prediction number 4: Traffic signals
Prediction number 5: Stop
/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)
/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/sklearn/utils/validation.py:420: DataConversionWarning: Data with input dtype int64 was converted to float64 by the normalize function.
  warnings.warn(msg, DataConversionWarning)
Prediction number 1: No entry
Prediction number 2: Stop
Prediction number 3: Speed limit (70km/h)
Prediction number 4: Roundabout mandatory
Prediction number 5: Speed limit (30km/h)
/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)
/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/sklearn/utils/validation.py:420: DataConversionWarning: Data with input dtype int64 was converted to float64 by the normalize function.
  warnings.warn(msg, DataConversionWarning)
Prediction number 1: Speed limit (30km/h)
Prediction number 2: Speed limit (70km/h)
Prediction number 3: General caution
Prediction number 4: Road work
Prediction number 5: No entry
/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)
/usr/local/python3-tensorflow-gpu/lib/python3.4/site-packages/sklearn/utils/validation.py:420: DataConversionWarning: Data with input dtype int64 was converted to float64 by the normalize function.
  warnings.warn(msg, DataConversionWarning)
Prediction number 1: General caution
Prediction number 2: Keep right
Prediction number 3: Speed limit (70km/h)
Prediction number 4: Roundabout mandatory
Prediction number 5: Speed limit (30km/h)
In [ ]: